{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introducing Pandas\n",
    "\n",
    "Pandas is a Python library that makes handling tabular data easier. Since we're doing data science - this is something we'll use from time to time!\n",
    "\n",
    "It's one of three libraries you'll encounter repeatedly in the field of data science:\n",
    "\n",
    "## Pandas\n",
    "Introduces \"Data Frames\" and \"Series\" that allow you to slice and dice rows and columns of information.\n",
    "\n",
    "## NumPy\n",
    "Usually you'll encounter \"NumPy arrays\", which are multi-dimensional array objects. It is easy to create a Pandas DataFrame from a NumPy array, and Pandas DataFrames can be cast as NumPy arrays. NumPy arrays are mainly important because of...\n",
    "\n",
    "## Scikit_Learn\n",
    "The machine learning library we'll use throughout this course is scikit_learn, or sklearn, and it generally takes NumPy arrays as its input.\n",
    "\n",
    "So, a typical thing to do is to load, clean, and manipulate your input data using Pandas. Then convert your Pandas DataFrame into a NumPy array as it's being passed into some Scikit_Learn function. That conversion can often happen automatically.\n",
    "\n",
    "Let's start by loading some comma-separated value data using Pandas into a DataFrame:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Employed?</th>\n",
       "      <th>Previous employers</th>\n",
       "      <th>Level of Education</th>\n",
       "      <th>Top-tier school</th>\n",
       "      <th>Interned</th>\n",
       "      <th>Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Y</td>\n",
       "      <td>4</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>BS</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>N</td>\n",
       "      <td>6</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>Y</td>\n",
       "      <td>1</td>\n",
       "      <td>MS</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>N</td>\n",
       "      <td>2</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Years Experience Employed?  Previous employers Level of Education  \\\n",
       "0                10         Y                   4                 BS   \n",
       "1                 0         N                   0                 BS   \n",
       "2                 7         N                   6                 BS   \n",
       "3                 2         Y                   1                 MS   \n",
       "4                20         N                   2                PhD   \n",
       "\n",
       "  Top-tier school Interned Hired  \n",
       "0               N        N     Y  \n",
       "1               Y        Y     Y  \n",
       "2               N        N     N  \n",
       "3               Y        N     Y  \n",
       "4               Y        N     N  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\"http://cdn.sundog-soft.com/RecSys/PastHires.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "head() is a handy way to visualize what you've loaded. You can pass it an integer to see some specific number of rows at the beginning of your DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Employed?</th>\n",
       "      <th>Previous employers</th>\n",
       "      <th>Level of Education</th>\n",
       "      <th>Top-tier school</th>\n",
       "      <th>Interned</th>\n",
       "      <th>Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Y</td>\n",
       "      <td>4</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>BS</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>N</td>\n",
       "      <td>6</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>Y</td>\n",
       "      <td>1</td>\n",
       "      <td>MS</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>N</td>\n",
       "      <td>2</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5</td>\n",
       "      <td>Y</td>\n",
       "      <td>2</td>\n",
       "      <td>MS</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3</td>\n",
       "      <td>N</td>\n",
       "      <td>1</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>15</td>\n",
       "      <td>Y</td>\n",
       "      <td>5</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Years Experience Employed?  Previous employers Level of Education  \\\n",
       "0                10         Y                   4                 BS   \n",
       "1                 0         N                   0                 BS   \n",
       "2                 7         N                   6                 BS   \n",
       "3                 2         Y                   1                 MS   \n",
       "4                20         N                   2                PhD   \n",
       "5                 0         N                   0                PhD   \n",
       "6                 5         Y                   2                 MS   \n",
       "7                 3         N                   1                 BS   \n",
       "8                15         Y                   5                 BS   \n",
       "9                 0         N                   0                 BS   \n",
       "\n",
       "  Top-tier school Interned Hired  \n",
       "0               N        N     Y  \n",
       "1               Y        Y     Y  \n",
       "2               N        N     N  \n",
       "3               Y        N     Y  \n",
       "4               Y        N     N  \n",
       "5               Y        Y     Y  \n",
       "6               N        Y     Y  \n",
       "7               N        Y     Y  \n",
       "8               N        N     Y  \n",
       "9               N        N     N  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also view the end of your data with tail():"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Employed?</th>\n",
       "      <th>Previous employers</th>\n",
       "      <th>Level of Education</th>\n",
       "      <th>Top-tier school</th>\n",
       "      <th>Interned</th>\n",
       "      <th>Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>N</td>\n",
       "      <td>1</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4</td>\n",
       "      <td>Y</td>\n",
       "      <td>1</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Years Experience Employed?  Previous employers Level of Education  \\\n",
       "9                  0         N                   0                 BS   \n",
       "10                 1         N                   1                PhD   \n",
       "11                 4         Y                   1                 BS   \n",
       "12                 0         N                   0                PhD   \n",
       "\n",
       "   Top-tier school Interned Hired  \n",
       "9                N        N     N  \n",
       "10               Y        N     N  \n",
       "11               N        Y     Y  \n",
       "12               Y        N     Y  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail(4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We often talk about the \"shape\" of your DataFrame. This is just its dimensions. This particular CSV file has 13 rows with 7 columns per row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(13, 7)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The total size of the data frame is the rows * columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "91"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The len() function gives you the number of rows in a DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your DataFrame has named columns (in our case, extracted automatically from the first row of a .csv file,) you can get an array of them back:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Years Experience', 'Employed?', 'Previous employers',\n",
       "       'Level of Education', 'Top-tier school', 'Interned', 'Hired'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extracting a single column from your DataFrame looks like this - this gives you back a \"Series\" in Pandas:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     Y\n",
       "1     Y\n",
       "2     N\n",
       "3     Y\n",
       "4     N\n",
       "5     Y\n",
       "6     Y\n",
       "7     Y\n",
       "8     Y\n",
       "9     N\n",
       "10    N\n",
       "11    Y\n",
       "12    Y\n",
       "Name: Hired, dtype: object"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Hired']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also extract a given range of rows from a named column, like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    Y\n",
       "1    Y\n",
       "2    N\n",
       "3    Y\n",
       "4    N\n",
       "Name: Hired, dtype: object"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Hired'][:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or even extract a single value from a specified column / row combination:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Y'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Hired'][5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To extract more than one column, you pass in an array of column names instead of a single one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>15</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>0</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Years Experience Hired\n",
       "0                 10     Y\n",
       "1                  0     Y\n",
       "2                  7     N\n",
       "3                  2     Y\n",
       "4                 20     N\n",
       "5                  0     Y\n",
       "6                  5     Y\n",
       "7                  3     Y\n",
       "8                 15     Y\n",
       "9                  0     N\n",
       "10                 1     N\n",
       "11                 4     Y\n",
       "12                 0     Y"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Years Experience', 'Hired']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also extract specific ranges of rows from more than one column, in the way you'd expect:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Years Experience Hired\n",
       "0                10     Y\n",
       "1                 0     Y\n",
       "2                 7     N\n",
       "3                 2     Y\n",
       "4                20     N"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Years Experience', 'Hired']][:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sorting your DataFrame by a specific column looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Employed?</th>\n",
       "      <th>Previous employers</th>\n",
       "      <th>Level of Education</th>\n",
       "      <th>Top-tier school</th>\n",
       "      <th>Interned</th>\n",
       "      <th>Hired</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>BS</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>0</td>\n",
       "      <td>N</td>\n",
       "      <td>0</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>N</td>\n",
       "      <td>1</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>Y</td>\n",
       "      <td>1</td>\n",
       "      <td>MS</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3</td>\n",
       "      <td>N</td>\n",
       "      <td>1</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4</td>\n",
       "      <td>Y</td>\n",
       "      <td>1</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5</td>\n",
       "      <td>Y</td>\n",
       "      <td>2</td>\n",
       "      <td>MS</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>N</td>\n",
       "      <td>6</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10</td>\n",
       "      <td>Y</td>\n",
       "      <td>4</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>15</td>\n",
       "      <td>Y</td>\n",
       "      <td>5</td>\n",
       "      <td>BS</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "      <td>Y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>N</td>\n",
       "      <td>2</td>\n",
       "      <td>PhD</td>\n",
       "      <td>Y</td>\n",
       "      <td>N</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Years Experience Employed?  Previous employers Level of Education  \\\n",
       "1                  0         N                   0                 BS   \n",
       "5                  0         N                   0                PhD   \n",
       "9                  0         N                   0                 BS   \n",
       "12                 0         N                   0                PhD   \n",
       "10                 1         N                   1                PhD   \n",
       "3                  2         Y                   1                 MS   \n",
       "7                  3         N                   1                 BS   \n",
       "11                 4         Y                   1                 BS   \n",
       "6                  5         Y                   2                 MS   \n",
       "2                  7         N                   6                 BS   \n",
       "0                 10         Y                   4                 BS   \n",
       "8                 15         Y                   5                 BS   \n",
       "4                 20         N                   2                PhD   \n",
       "\n",
       "   Top-tier school Interned Hired  \n",
       "1                Y        Y     Y  \n",
       "5                Y        Y     Y  \n",
       "9                N        N     N  \n",
       "12               Y        N     Y  \n",
       "10               Y        N     N  \n",
       "3                Y        N     Y  \n",
       "7                N        Y     Y  \n",
       "11               N        Y     Y  \n",
       "6                N        Y     Y  \n",
       "2                N        N     N  \n",
       "0                N        N     Y  \n",
       "8                N        N     Y  \n",
       "4                Y        N     N  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(['Years Experience'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can break down the number of unique values in a given column into a Series using value_counts() - this is a good way to understand the distribution of your data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BS     7\n",
       "PhD    4\n",
       "MS     2\n",
       "Name: Level of Education, dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "degree_counts = df['Level of Education'].value_counts()\n",
    "degree_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas even makes it easy to plot a Series or DataFrame - just call plot():"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x22090b3a748>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEGCAYAAABFBX+4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADJdJREFUeJzt3W2MpfVZx/Hvjx0QipSaMlpaOl2NDaSCPDghGEyTQmtADBo1KWu0DzaZ+MIKSWND65sW39TEmFqfkk2fSK1QrBANbdcSEWtVqLMUu2wXkkqgIFUGG55aUwQuX8xZXIeZOffAnDlzzX4/yYTzcOfMFU7mu/f5n/vcJ1WFJKmPY6Y9gCRpYwy3JDVjuCWpGcMtSc0YbklqxnBLUjOGW5KaMdyS1IzhlqRmZibxoKecckrt3r17Eg8tSTvS/v37H62q2SHbTiTcu3fvZnFxcRIPLUk7UpIHhm7rUokkNWO4JakZwy1JzRhuSWrGcEtSM2PDneT0JHcd8fNEkqu2YjhJ0guNPRywqu4FzgFIsgv4d+CmCc8lSVrDRpdKLgb+raoGH28oSdpcGw33FcB1kxhEkjTM4E9OJjkOuBx43xr3LwALAHNzc5sy3BC7r/7clv2uabj/Q5dNewRJ28xG9rgvBe6sqv9c7c6q2ltV81U1Pzs76OP2kqQXYSPh3oPLJJI0dYPCneRlwFuAGyc7jiRpnEFr3FX1XeCVE55FkjSAn5yUpGYMtyQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmjHcktSM4ZakZgy3JDVjuCWpGcMtSc0YbklqxnBLUjOGW5KaMdyS1IzhlqRmDLckNWO4JakZwy1JzRhuSWpmULiTvCLJZ5Pck+RQkp+c9GCSpNXNDNzuD4B9VfVLSY4DXjbBmSRJ6xgb7iQvB94IvAOgqp4Gnp7sWJKktQxZKvkRYAn4RJKvJvlokhNXbpRkIcliksWlpaVNH1SStGxIuGeA84A/rapzge8AV6/cqKr2VtV8Vc3Pzs5u8piSpMOGhPsh4KGqumN0/bMsh1ySNAVjw11V/wE8mOT00U0XA1+f6FSSpDUNPark3cCnR0eU3Ae8c3IjSZLWMyjcVXUXMD/hWSRJA/jJSUlqxnBLUjOGW5KaMdyS1IzhlqRmDLckNWO4JakZwy1JzRhuSWrGcEtSM4Zbkpox3JLUjOGWpGYMtyQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmjHcktTMoC8LTnI/8CTwLPBMVfnFwZI0JYPCPfKmqnp0YpNIkgZxqUSSmhka7gK+mGR/koVJDiRJWt/QpZILq+rhJD8I3JLknqr60pEbjIK+ADA3N7fJY0qSDhu0x11VD4/++whwE3D+Ktvsrar5qpqfnZ3d3CklSc8bG+4kJyY56fBl4KeBuyc9mCRpdUOWSn4IuCnJ4e3/vKr2TXQqSdKaxoa7qu4Dzt6CWSRJA3g4oCQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmjHcktSM4ZakZgy3JDVjuCWpGcMtSc0YbklqxnBLUjOGW5KaMdyS1IzhlqRmDLckNWO4JakZwy1JzRhuSWrGcEtSM4PDnWRXkq8muXmSA0mS1reRPe4rgUOTGkSSNMygcCc5DbgM+Ohkx5EkjTMzcLsPA+8FTlprgyQLwALA3NzcS59MR4cPnDztCSbrA49PewLtQGP3uJP8LPBIVe1fb7uq2ltV81U1Pzs7u2kDSpL+vyFLJRcClye5H7geuCjJn010KknSmsaGu6reV1WnVdVu4Arg1qr6lYlPJklalcdxS1IzQ9+cBKCqbgNum8gkkqRB3OOWpGYMtyQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmjHcktSM4ZakZgy3JDVjuCWpGcMtSc0YbklqxnBLUjOGW5KaMdyS1IzhlqRmDLckNWO4JakZwy1JzYwNd5Ljk3wlyb8mOZjkg1sxmCRpdUO+5f17wEVV9VSSY4EvJ/lCVd0+4dkkSasYG+6qKuCp0dVjRz81yaEkSWsbtMadZFeSu4BHgFuq6o7JjiVJWsugcFfVs1V1DnAacH6SM1duk2QhyWKSxaWlpc2eU5I0sqGjSqrqMeA24JJV7ttbVfNVNT87O7tJ40mSVhpyVMlskleMLp8AvBm4Z9KDSZJWN+SoklOBa5PsYjn0N1TVzZMdS5K0liFHlXwNOHcLZpEkDeAnJyWpGcMtSc0YbklqxnBLUjOGW5KaMdyS1IzhlqRmDLckNWO4JakZwy1JzRhuSWrGcEtSM4Zbkpox3JLUjOGWpGYMtyQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmhkb7iSvTfJ3SQ4lOZjkyq0YTJK0upkB2zwDvKeq7kxyErA/yS1V9fUJzyZJWsXYPe6q+lZV3Tm6/CRwCHjNpAeTJK1uQ2vcSXYD5wJ3TGIYSdJ4Q5ZKAEjy/cBfAldV1ROr3L8ALADMzc1t2oCStq+zrj1r2iNMzIG3H5j2CGsatMed5FiWo/3pqrpxtW2qam9VzVfV/Ozs7GbOKEk6wpCjSgJ8DDhUVb8/+ZEkSesZssd9IfCrwEVJ7hr9/MyE55IkrWHsGndVfRnIFswiSRrAT05KUjOGW5KaMdyS1IzhlqRmDLckNWO4JakZwy1JzRhuSWrGcEtSM4Zbkpox3JLUjOGWpGYMtyQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmjHcktSM4ZakZgy3JDUzNtxJPp7kkSR3b8VAkqT1Ddnj/iRwyYTnkCQNNDbcVfUl4NtbMIskaYBNW+NOspBkMcni0tLSZj2sJGmFTQt3Ve2tqvmqmp+dnd2sh5UkreBRJZLUjOGWpGaGHA54HfDPwOlJHkryrsmPJUlay8y4Dapqz1YMIkkaxqUSSWrGcEtSM4Zbkpox3JLUjOGWpGYMtyQ1Y7glqRnDLUnNGG5JasZwS1IzhluSmjHcktSM4ZakZgy3JDVjuCWpGcMtSc0YbklqxnBLUjOGW5KaMdyS1MygcCe5JMm9Sb6R5OpJDyVJWtvYcCfZBfwxcCnwBmBPkjdMejBJ0uqG7HGfD3yjqu6rqqeB64Gfm+xYkqS1DAn3a4AHj7j+0Og2SdIUzAzYJqvcVi/YKFkAFkZXn0py70sZbBs7BXh0q35ZfnerftNRY0ufPz642p+PXoIte/7yji1/7l43dMMh4X4IeO0R108DHl65UVXtBfYO/cVdJVmsqvlpz6EXx+evN5+/ZUOWSv4FeH2SH05yHHAF8NeTHUuStJaxe9xV9UyS3wD+BtgFfLyqDk58MknSqoYslVBVnwc+P+FZutjxy0E7nM9fbz5/QKpe8D6jJGkb8yPvktSM4ZakZgy3JDUz6M3Jo1WS1wGPVdXjo+tvAn4eeAD4o9EpALSNJZlh+Tw7Z4xuOgTsq6pnpjeVXowkrwTeCHyzqvZPe55pco97fTcAJwIkOQf4C+CbwNnAn0xxLg2Q5NXAQeA9wKtZPlXDbwEHR/dpG0tyc5IzR5dPBe4Gfg34VJKrpjrclHlUyTqSfK2qfnx0+feA56rqvUmOAe46fJ+2pySfZPl5+vCK238T+ImqevtUBtMgSQ5W1Y+NLr8fOKOq3pbkJOAfj+a/P/e413fkyQouAv4WoKqem8442qALVkYboKo+AlwwhXm0Mf9zxOWLGX2WpKqeBI7qv0HXuNd3a5IbgG8BPwDcCs+/bHN9e/v773Xu++6WTaEX68Ek72b5fEnnAfsAkpwAHDvNwabNcK/vKuCtwKnAT1XV4T2AVwG/PbWpNNTJSX5hldsDvHyrh9GGvQu4Bngz8Naqemx0+wXAJ6Y21TbgGvcGJTkF+K/yf9y2l2TdP+6qeudWzSJtJsO9jiQXAB8Cvg38DvApls8HfAzwtqraN8XxpB0tybpnIa2qy7dqlu3GcK8jySLwfuBklk9uc2lV3Z7kDOC6qjp3qgNqkCTfB/wisJsjlger6pppzaTxkiyx/O1b1wF3sOJLXarq76cx13bgGvf6ZqrqiwBJrqmq2wGq6p7EbzZp5K+Ax4H9wPemPIuGexXwFmAP8MvA51jeYTrqTyttuNd35CFHK49Q8KVKH6dV1SXTHkIbU1XPsnwkyb7Rq6Y9wG2jnag/nO5002W413d2kidYfol2wugyo+vHT28sbdA/JTmrqg5MexBtzCjYl7Ec7d3AR4AbpznTduAat3asJAdYfmU0A7weuI/lpZIAdTR/8q6DJNcCZwJfAK6vqrunPNK2Ybi1Y41OEramqnpgq2bRxiV5DvjO6OqRoTr8D+9Reyy+4daOleR44NeBHwUOAB/zrIDaCQy3dqwkn2H5fBf/wPKpXR+oqiunO5X00hlu7VhJDlTVWaPLM8BXquq8KY8lvWSeHVA72fNnl3OJRDuJe9zasZI8y/+9uRXgBJbPCnjUv7ml3gy3JDXjUokkNWO4JakZwy1JzRhuSWrGcEtSM/8Lq9rO+y7GTvEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "degree_counts.plot(kind='bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "\n",
    "Try extracting rows 5-10 of our DataFrame, preserving only the \"Previous Employers\" and \"Hired\" columns. Assign that to a new DataFrame, and create a histogram plotting the distribution of the previous employers in this subset of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
